Search CORE

11 research outputs found

A Simple Correlation-Based Model of Intelligibility for Nonlinear Speech Enhancement and Separation

Author: Boldt Jesper B.
Ellis Daniel P. W.
Publication venue
Publication date: 01/01/2009
Field of study

Applying a binary mask to a pure noise signal can result in speech that is highly intelligible, despite the absence of any of the target speech signal. Therefore, to estimate the intelligibility benefit of highly nonlinear speech enhancement techniques, we contend that SNR is not useful; instead we propose a measure based on the similarity between the time-varying spectral envelopes of target speech and system output, as measured by correlation. As with previous correlation-based intelligibility measures, our system can broadly match subjective intelligibility for a range of enhanced signals. Our system, however, is notably simpler and we explain the practical motivation behind each stage. This measure, freely available as a small Matlab implementation, can provide a more meaningful evaluation measure for nonlinear speech enhancement systems, as well as providing a transparent objective function for the optimization of such systems

CiteSeerX

Columbia University Academic Commons

VBN

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Model-based Speech Enhancement for Intelligibility Improvement in Binaural Hearing Aids

Author: Boldt Jesper B.
Christensen Mads G.
Kavalekalam Mathew Shaji
Nielsen Jesper K.
Publication venue
Publication date: 01/10/2018
Field of study

Speech intelligibility is often severely degraded among hearing impaired individuals in situations such as the cocktail party scenario. The performance of the current hearing aid technology has been observed to be limited in these scenarios. In this paper, we propose a binaural speech enhancement framework that takes into consideration the speech production model. The enhancement framework proposed here is based on the Kalman filter that allows us to take the speech production dynamics into account during the enhancement process. The usage of a Kalman filter requires the estimation of clean speech and noise short term predictor (STP) parameters, and the clean speech pitch parameters. In this work, a binaural codebook-based method is proposed for estimating the STP parameters, and a directional pitch estimator based on the harmonic model and maximum likelihood principle is used to estimate the pitch parameters. The proposed method for estimating the STP and pitch parameters jointly uses the information from left and right ears, leading to a more robust estimation of the filter parameters. Objective measures such as PESQ and STOI have been used to evaluate the enhancement framework in different acoustic scenarios representative of the cocktail party scenario. We have also conducted subjective listening tests on a set of nine normal hearing subjects, to evaluate the performance in terms of intelligibility and quality improvement. The listening tests show that the proposed algorithm, even with access to only a single channel noisy observation, significantly improves the overall speech quality, and the speech intelligibility by up to 15%.Comment: after revisio

arXiv.org e-Print Archive

VBN

Model based Binaural Enhancement of Voiced and Unvoiced Speech

Author: Boldt Jesper B.
Christensen Mads Græsbøll
Kavalekalam Mathew Shaji
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Crossref

VBN

Pitch-based non-intrusive objective intelligibility prediction

Author: Boldt Jesper B.
Christensen Mads G.
Sorensen Charlotte
Xenaki Angeliki
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/06/2017
Field of study

Crossref

VBN

Experimental Study of Generalized Subspace Filters for the Cocktail Party Situation

Author: Boldt Jesper B.
Christensen Knud Bank
Christensen Mads Græsbøll
Gran Fredrik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

VBN

Single channel speech enhancement in the modulation domain: New insights in the modulation channel selection framework

Author: Bertelsen Andreas Thelander
Boldt Jesper B.
Dau Torsten
Gran Fredrik
Jorgensen Soren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

Crossref

Online Research Database In Technology

The He-rich core-collapse supernova 2007Y: Observations from X-ray to Radio Wavelengths

A detailed study spanning approximately a year has been conducted on the Type Ib supernova 2007Y. Imaging was obtained from X-ray to radio wavelengths, and a comprehensive set of multi-band (w2m2w1u'g'r'i'UBVYJHKs) light curves and optical spectroscopy is presented. A virtually complete bolometric light curve is derived, from which we infer a (56)Ni-mass of 0.06 M_sun. The early spectrum strongly resembles SN 2005bf and exhibits high-velocity features of CaII and H_alpha; during late epochs the spectrum shows evidence of a ejecta-wind interaction. Nebular emission lines have similar widths and exhibit profiles that indicate a lack of major asymmetry in the ejecta. Late phase spectra are modeled with a non-LTE code, from which we find (56)Ni, O and total-ejecta masses (excluding He) to be 0.06, 0.2 and 0.42 M_sun, respectively, below 4,500 km/s. The (56)Ni mass confirms results obtained from the bolometric light curve. The oxygen abundance suggests the progenitor was most likely a ~3.3 M_sun He core star that evolved from a zero-age-main-sequence mass of 10-13 M_sun. The explosion energy is determined to be ~10^50 erg, and the mass-loss rate of the progenitor is constrained from X-ray and radio observations to be <~10^-6 M_sun/yr. SN 2007Y is among the least energetic normal Type Ib supernovae ever studied.Comment: Corrected error in Tab. 2 & 3. Photometry has not change

arXiv.org e-Print Archive

Crossref

Texas A&M Repository

Copenhagen University Research Information System

Caltech Authors

MPG.PuRe

Binaural speech enhancement using a codebook based approach

Author: Boldt Jesper B.
Christensen Mads Græsbøll
Kavalekalam Mathew Shaji
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2016
Field of study

VBN

Kalman filter for speech enhancement in cocktail party scenarios using a codebook-based approach

Author: Boldt Jesper B.
Christensen Mads Græsbøll
Gran Fredrik
Kavalekalam Mathew Shaji
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2016
Field of study

VBN

An evaluation of objective measures for intelligibility prediction of time-frequency weighted noisy speech

Author: Beerends J. G.
Beerends J. G.
Boldt J. B.
Cees H. Taal
Hansen J. H. L.
Hendriks R. C.
Itakura F.
Jesper Jensen
Kitawaki N.
Klatt D.
Liu W. M.
Ludvigsen C.
Paliwal K. K.
Preminger J.
Richard C. Hendriks
Richard Heusdens
Sheskin D. J.
Taal C. H.
Taal C. H.
Tribolet J. M.
Yamada T.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 30/11/2011
Field of study

Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate in cases where noisy speech is processed by a time-frequency weighting. To this end, an extensive evaluation is presented of objective measure for intelligibility prediction of noisy speech processed with a technique called ideal time frequency (TF) segregation. In total 17 measures are evaluated, including four advanced speech-intelligibility measures (CSII, CSTI, NSEC, DAU), the advanced speech-quality measure (PESQ), and several frame-based measures (e.g., SSNR). Furthermore, several additional measures are proposed. The study comprised a total number of 168 different TF-weightings, including unprocessed noisy speech. Out of all measures, the proposed frame-based measure MCC gave the best results (q¼0.93). An additional experiment shows that the good performing measures in this study also show high correlation with the intelligibility of single-channel noise reduced speech.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

Crossref

TU Delft Repository